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W. T. Gowers 

Abstract. Babai and Sos have asked whether there exists a constant c > such that 
every Unite group G has a product-free subset of size at least c\G\: that is, a subset X 
that does not contain three elements x, y and z with xy = z. In this paper we show that 
the answer is no. Moreover, we give a simple sufficient condition for a group not to have 

O ■ 

CS| ■ any large product-free subset. 
O 

Q ■ §1. Introduction. 

°. 

The starting point for this paper is a well-known result of Erdos, which states that 
for every n-element subset X of Z there is a subset Y C X of size at least n/3 that is 
sum-free, in the sense that if y\ and 7/2 belong to Y then y\ + y 2 does not belong to Y . 
The proof is so simple that it can be given in full here. First, choose a prime p such that 
X lives in the interval [—p/3,p/3]. A subset Y C X is then sum-free if and only if it is 
sum-free mod p. But if r is any integer not congruent to mod p, then Y is sum-free mod 
p if and only if rY is sum-free mod p. Moreover, a simple averaging argument shows that 
^ ■ one can find r such that at least a third of the elements of rX lie in the interval [p/3, 2p/3] 
mod p. Therefore, X has a subset Y of size at least n/3 such that rY, and hence Y, is 
sum-free. 

Using the classification of Abelian groups it is easy to see that the same result holds if 
X is a subset of an Abelian group, but the situation for non- Abelian groups is less clear. In 
1985, Babai and Sos [2] noted that if if is a subgroup of G of index k, then any non-trivial 
^ ■ coset of H is product-free. From the classification of finite simple groups it can be shown 
that every finite simple group of order n has a subgroup of index at most Cn 3 / 7 and hence 
a product-free set of size at least cn 4 / 7 . Combining that with the fact that a product-free 
subset of a quotient of G lifts to a product-free subset of G, one can deduce the same result 
for all finite groups. In 1997, Kedlaya [11] (see also [12]) improved this bound to cn 11 / 14 
by showing that if H has index k then one can find a union of c/c 1 / 2 cosets of H, a large 
subset of which is product free. 

In the other direction, nothing much was known. Indeed, Babai and Sos asked whether 
the lower bound could be improved to cn for some positive constant c, and Kedlaya repeated 
the question, while also asking the weaker question of whether, for every e > 0, one can 
obtain a bound of c(e)n 1_e . This paper answers these questions in the negative, by showing 
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that, for sufficiently large q, the group PSL 2 (g) has no product-free subset of size Cn 8 / 9 , 
where n is the order of PSL 2 (g). In fact, we prove the stronger result that if A, B and C are 
three subsets of PSL 2 (<?) of size at least Cn 8 / 9 , then there is a triple (a, 6, c) G A x B x C 
such that ab = c. 

The proof has three stages. First, we briefly review some facts about quasirandom 
bipartite graphs and quasirandom subsets of groups - detailed proofs of most of these can 
be found elsewhere, and we give simple proofs of those that cannot. Secondly, we prove 
that the "bipartite Cay ley graph" associated with PSL 2 ((/) and one of the three sets under 
consideration is quasirandom. Finally, we show that this quasirandomness immediately 
implies our result. 

Having proved this theorem, we step back and look at what we have done from a 
more abstract point of view. The property of PSL 2 (g) that makes it suitable for results of 
this kind is that it has no non-trivial irreducible representations of low dimension. This 
property has been used in a similar way before: it is due to Sarnak and Xue [16]. It was 
also used in [7] to prove that the famous Ramanujan graphs of Lubotsky, Phillips and 
Sarnak [14] are expanders (this is a weaker result than that of [14] but the proof is much 
easier) , and it has recently been used by Bourgain and Gamburd [4] to show the same for 
certain other Cayley graphs. 

Our main result is rather easier than theirs. However, this very fact may make it 
useful to readers who do not have a background in representation theory and who would 
like to see how information about representations can be used. If a group has no non- 
trivial low-dimensional representations, it seems appropriate to call it quasirandom since, 
as we show later in the paper, this property is equivalent to several other properties, some 
of which state that certain associated graphs are quasirandom. Once we have stated and 
proved various equivalences of this kind, we prove some further results. The first of these is 
a partial converse to our main theorem: if a finite group G contains no large product-free 
subset, then it is quasirandom. The reason this is a "partial" converse is that the bounds 
we obtain are not very good: for most of the results in the paper there is a power-type 
dependence of one constant on another, but for this one it is exponential/logarithmic. 

Section 4 ends with another weak equivalence. It is easy to prove that a group is 
not quasirandom if it has a non-trivial quotient that is either Abelian or of small order. 
We show that, in the absence of these obvious obstructions, a group G is quasirandom. 
In particular, non-Abelian finite simple groups are quasirandom. Again, we obtain expo- 
nential/logarithmic bounds, but for this result it is unavoidable because the dimension of 
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the smallest non-trivial representation is a power of n for some finite simple groups and 
logarithmic in n for others. 

In Section 5 we prove a generalization of the main theorem to more complicated sets 
of equations. The theorem itself allows one to place a, b and ab into specified dense subsets 
of a quasirandom group. It turns out that one can do the same with more variables: for 
example, the next case says that a, b, c, ab, be, ac and abc can be placed into specified 
sets. 

The final section of this paper collects together some open problems that have arisen 
during the paper, and adds a few more. 

§2. Quasirandom graphs and sets. 

As promised, let us briefly review some of the standard theory of quasirandomness, 
concentrating in particular on the definitions of a quasirandom graph, a quasirandom 
bipartite graph and of a quasirandom subset of an Abelian group. The first few results of 
this section will not be used later, so we shall not give their proofs. However, they put the 
later results into their proper context. 

The notion of a quasirandom graph was introduced by Chung, Graham and Wilson 
[6], though a similar notion (of so-called "jumbled" graphs) had been defined by Thomason 
[17]. If a; is a vertex in a graph, we shall write N x for its neighbourhood. The adjacency 
matrix A of a graph G is defined by A(x, y) = 1 if xy is an edge of G and A(x, y) = 
otherwise. 

Theorem 2.1. Let G be a graph with n vertices and density p. Then the following 
statements are polynomially equivalent, in the sense that if one statement holds for a 
constant c, then all others hold with constants that are bounded above by a positive power 
of c. 

WE^ e y( G )l^niV y | 2 ^(p 4 + Cl )n 4 . 

(ii) The number of labelled 4-cycles in G is at most (p 4 + ci)n 4 . 

(in) For any two subsets A, B C V(G) the number of pairs (x,y) E A x B such that 
xy E E(G) differs from p\A\\B\ by at most C2n 2 . 

(iv) The second largest modulus of an eigenvalue of the adjacency matrix of G is at 
most csn. 

A graph that satisfies one, and hence all, of these properties for a small c is called 
quasirandom. If one wishes to be more precise, then one can say that G is c-quasirandom 
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if it satisfies property (i) (or equivalently (ii)) with constant c\ = c. A random graph with 
edge probability p is almost always quasirandom with small c, and quasirandom graphs 
have many properties that random graphs have. In particular, if H is any fixed small 
graph, and is a random map from V(H) to V(G), then the probability that 4>{x)4>{y) 
is an edge of G whenever xy is an edge of H (in which case is a homomorphism) is 
roughly what one would expect, namely p\ E W\ : and the probability that in addition no 
non-edge of H maps to an edge of G (in which case is an isomorphic embedding) is 
roughly p\Em\ {1 _ p) r^)-\E(H)\_ 

A quasirandom bipartite graph is like a quasirandom graph but with some obvious 
modifications. As above, we state a theorem that serves as a definition as well. 



Theorem 2.2. Let G be a bipartite graph with vertex sets X and Y and p\X\\Y\ edges. 
Then the following statements are polynomially equivalent. 

(i) Z x , x >ex\N x nN x ,\* ( P 4 + Cl )W|Yf. 



(i) E y ,yeY \ N y n N y'? < ^ + d)\ x \ 2 \ Y 
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(ii) The number of labelled 4-cycles that start in X is at most (p 4 + ci)\X\ 2 \Y\ 2 . 
(iv) For any two subsets A C X and B C Y the number of pairs (x,y) E A x B such 
that xy G E(G) differs from p\A\\B\ by at most c 2 \X\\Y\. 



We call a bipartite graph c- quasirandom if it satisfies condition (i) (and therefore the 
exactly equivalent conditions (ii) and (iii)) with constant c\ = c. 

Note that we have not given an eigenvalue condition. This is because the bipartite 
adjacency matrix (that is, the obvious 01-function defined on X x Y as opposed to (JU7) 2 ) 
is not symmetric. However, as we shall see later, there is a natural analogue of this 
condition. 

To continue our quick survey of known results, let us define quasirandom subsets of 
Abelian groups. This is a straightforward generalization of a definition of Chung and 
Graham [5] for the case Z/pZ. Again, we present it as a theorem rather than a definition. 
Recall that if G is an Abelian group, / is a function from G to C and 7 : G — > C is a 
character of G, then the Fourier transform of /, evaluated at 7, is the number ^(7) = 
J2 9 eG i '(9)1(9)- If /1 an d fi are two functions defined on G, then their convolution 
fi * f 2 is defined by f x * f 2 (g) = Ex+ y = 3 fi( x )f2(y)- If A is a subset of G we shall use 
the letter A also for the characteristic function of A. That is, A(x) = 1 if x e A and 
otherwise. 
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Theorem 2.3. Let G be an Abelian group of order n and let A C G be a set of size pn. 
Then the following are equivalent. 

(ii) There are at most (p 4 + ci)n 3 solutions in A of the equation x + y = z + w. 
(»VE g e G \ A * A (9)\ 2 ^(P* + ciW. 

(iv) For every subset B C G, Y, ge G \ A * B (9)\ 2 < n" 1 |A| 2 |B| 2 + c 2 n 3 . 

(v) The graph with vertex set G and with x joined to y if and only if x + y G A is 
C\ -quasirandom. 

(vi) The bipartite graph with two copies of G as its vertex sets and with x joined to 
y if and only if y — x G A is C\-quasirandom. 

(vii) 1^4(7)1 ^ c^n for all non-trivial characters 7. 

It is often convenient to replace Theorems 2.2 and 2.3 with "functional" or "analytic" 
versions, as follows. 

Theorem 2.4. Let X and Y be two finite sets and let f : X x Y — > C be a function that 
takes values of modulus at most 1. Then the following properties of f are polynomially 
equivalent. 



(i) E^ex £ w>tf ' e y !/)/(*» I/W, v)fW, v') < * W\Y\ 2 . 

(ii) For any two functions u : X — > C and v : Y — > C taking values of modulus at 
most 1, 

l ^c 2 \X\\Y\. 

x,y 



\^2f(x,y)u(x)v(y) 

x,y 

(Hi) For any two sets A C X and B C Y, 

' ^c 3 \X\\Y\. 



A function / with one, and hence all three, of the above properties is called quasirandom. 
More precisely, we call it c-quasirandom if property (i) holds with constant c. 

Theorem 2.4 is closely related to Theorem 2.2. Indeed, if G is a bipartite graph with 
vertex sets X and Y and density p, then G is quasirandom if and only if the function 
f(x,y) = G(x,y) — p is quasirandom, where we have written G for the characteristic 
function of the graph as well (so f(x,y) is 1 — p if (x,y) is an edge and — p otherwise). 
This is particularly easy to show if G is regular, in the sense that every vertex in X has 
degree p\Y\ and every vertex in Y has degree p\X\. Then a quick calculation shows that 
G is c-quasirandom if and only if / is c-quasirandom. 
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Now let us give a functional version of Theorem 2.3. Instead of trying to give as many 
equivalences as possible, we shall restrict our attention to ones that will be of interest later 
(in Section 4, when we come to define quasirandom groups). These apply to subsets of an 
arbitrary group. They are not deep equivalences, as one might suspect from the fact that 
they all hold with the same constant. 

Theorem 2.5. Let G be a group of order n and let f : G — > C be a function taking values 
of modulus at most 1. Then the ^following are exactly equivalent. 

^ cn 3 . 



(V Exec E y€G f( x )f(y x ) 



(n) Eo6-i=cd-i f(a)f(b)f(c)f(d) < cn\ 

(in) The function F(x 7 y) = f(xy~ 1 ) is a c-quasirandom function on G x G. 
Proof. To see that (i) and (ii) are equivalent, note that the sum on the left-hand side of 
(i) is equal to 

E f(x)f(yx)f(z)f(yz). 

x,y,zeG 

The result now follows from the obvious one-to-one correspondence between quadruples 
(a, 6, c, d) such that ab~ x = cd~ x and quadruples of the form (x, yx, z, yz). 
To see that (ii) and (iii) are equivalent, note that 

E E F ^ y)F(x,y')F(x',y)F(x', y') = E E f(xy- 1 )f(xy'- 1 )f(x'y- 1 )f(x'y'- 1 ) . 

x,x' y,y' x,x' y,y' 

Now for each x,x',y and y' we have (xy~ 1 )(x'y~ 1 )~ 1 = (xy'~ 1 )(x'y'~ 1 )~ 1 . In the other 
direction, if ab~ x = cd~ x and g is any group element, then let y = g, x = ag, y' = c~ x ag 
and x' = dc~ x ag = bg. Then xy~ x = a, x'y~ x = 6, xy'~ x = c and x'y'~ l = d. This 
gives us an n-to-one correspondence between quadruples (xy _1 , x'y~ x , xy'~ x , x'y' -1 ) and 
quadruples (a, 6, c, d) such that ab~ x = c<i _1 , which proves that (ii) holds if and only if 

E E y)F(x,y')F(x',y)F(x', y') ^ cn\ 

x,x' y,y' 

that is, if and only if (iii) holds. □ 

If these properties hold (as well as the hypotheses of the theorem) then we shall say 
that / is c-quasirandom. For more details about quasirandom graphs, sets and functions, 
including proofs of most of the previous results, the reader is referred to the early sections 
of [9]. (This is by no means the only reference, but is chosen because the presentation 
there harmonizes well with the presentation in this paper.) 
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Let us now return to the question of a "spectral theory" for bipartite graphs. For an 
ordinary graph G, one observes that the adjacency matrix is symmetric and can therefore 
be decomposed as Yl7=i ® u i f° r some orthonormal basis (ui) of eigenvectors, with 
Aj the eigenvalue corresponding to ttj. (Here we write u <8> v for the matrix that takes 
the value u(x)v(y) at (x,y). If v and w are elements of inner product spaces V and W, 
then we write w <E> u for the linear map from V to defined by x i— > (x, i>)w;. Notice 
that these two definitions are consistent.) For a bipartite graph, the adjacency matrix is 
no longer symmetric, so this result is no longer true. However, what we can do instead is 
decompose it as a sum Yl7=i ^i u i® v ii where (u{) and (i>j) are two orthonormal bases. This 
is called the singular value decomposition of the matrix, which was discovered in the late 
19th century and is important in numerical analysis. For the convenience of the reader, 
we give a proof that it always exists (in the real case). 

Theorem 2.6. Let a be any linear map from a real inner product space V to a real inner 
product space W. Then a has a decomposition of the form Yli=i ^i w i ® Vi, where the 
sequences (w{) and (v^ are orthonormal in W and V, respectively, each Aj is non-negative, 
and k is the smaller of dim V and dim W. 

Proof. To begin, let v be a non-zero vector such that ||cn>||/||i>|| is maximized. (For this 
proof, |.| is the standard Euclidean norm and (, ) the standard inner product, either on 
M. m or R n .) Now suppose that w is any vector orthogonal to v and let 6 be a small real 
number. Then \\a(v + 8w) || 2 = \\av || 2 + 25(av, aw) + o(S), and \\v + Sw\\ 2 = \\v\\ 2 + o(5). 
It follows that (av, aw) = 0, since otherwise we could pick a small 5 with the same sign 
as (av,aw) and we would find that \\a(v + + Sw\\ was bigger than ||o;i>||/||t> ||. 

Let X and Y be the subspaces of lR n and R m orthogonal to v and av, respectively. 
They can be given orthonormal bases, and a maps everything in X to Y. Let (3 be the 
restriction of a to X . By induction, (3 has a decomposition of the required form. That 
is, we can write {3 = ^2i =2 ^i w i ® v i with vi e X and Wi £ Y. Now set v\ = v/\\v\\, 
wi = av/\\av\\ = avi/\\av\ \\ and Ai = ||o!fi||. Then av\ = XiWi, from which it follows 
that a = Yli=i ^i w i ® v ii as required. □ 

This theorem is of course equivalent to a very similar statement about matrices, and 
indeed that is how we shall apply it. 

The fact that singular values are the correct analogue of eigenvalues for bipartite 
graphs has been realized before. See for example [3]. The next two results illustrate the 
connection very clearly. 
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Lemma 2.7. Let G be a bipartite graph with vertex sets X and Y and identify G with its 
bipartite adjacency matrix Ym=i ^i w i® v i, where (vt) and (wi) are orthonormal sequences. 
Then £V Af is the number of edges in G and J2i ^4 JS the number of labelled 4-cycles that 
start in X. 

Proof. The number of edges in G is tr (G T G). But G T is J2i \vi®wi. It is easy to verify 
that (vi <8> Wi)(wj <S> Vj) = Vi ® Vj. But tr(vi <g> Vj) = 1 if % = j and otherwise, so the first 
statement of the lemma follows. 

The second part is similar. The number of labelled 4-cycles that start in X is 
tr(G T GG T G). If we expand G and G T then once again the only terms that survive are 
those that use a single i. But in this case we have four terms, so the answer is J2i ^t- ^ 

The next result gives a further condition that is equivalent to quasirandomness for 
regular bipartite graphs. 

Theorem 2.8. Let G be a regular bipartite graph with vertex sets X and Y, p\X\\Y\ edges 
and identify G with its bipartite adjacency matrix. Then the following are polynomially 
equivalent. 

(i) G is ci-quasirandom. 

(ii) The maximum of\\Gf\\/\\f\\ over all non-zero functions f such that Ylxex f( x ) = ® 
is at most c 2 \X\ 1 / 2 \Y\ 1 / 2 . 

Proof. By Theorem 2.6 we can write G = X^=i ^i w i ® V{ for orthonormal sequences (vi) 
and (wi). By Lemma 2.7, the number of labelled 4-cycles in G that start in X is X^=i ^4- 
Suppose that the decomposition is chosen so that u\ and v\ are constant functions, which 
implies that Ai = plX^^lYl 1 / 2 . Then, if (ii) holds, we find that 

j2^^P 4 \X\ 2 \Y\ 2 + 4\X\\Y\j2\ 2 • 

i=l i=2 

By Lemma 2.7, Y^h=2 ^ Pl^ll^l) so tm s is at most (p 4 +pc|)|X| 2 |y| 2 , which establishes 
(i) with ci = pc\. 

Conversely, if (i) holds, then £?=i Xf < (p 4 + ci)|X| 2 |y| 2 . Since Ai = p|X| 1 / 2 |y| 1 /2 ? 
it follows that every other Aj is at most c^ 4 ^! 1 / 2 ^! 1 / 2 . Since the maximum of these 
other Aj is precisely the maximum in (ii), we have established (ii) with c 2 = c^ 4 . □ 

The next lemma is a simple fact, but for our purposes it will be very important. In 
the statement, if G is a bipartite graph with vertex sets X and Y of not necessarily the 
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x,x 



same size, we call it regular if every vertex in X has the same degree and every vertex in 
Y has the same degree. 

Lemma 2.9. Let G be a regular bipartite graph with vertex sets X and Y. Let a be the 
linear map from C x to C Y derived from the bipartite adjacency matrix of G. (That is, if 
f : X — > C then otf(y) = Ylxex x y eE(G) f( x )-) Then the set of all functions f : X — > C 
such that Ylxex f( x ) = ® an( ^ \\ a f\\/\\f\\ JS maximized forms a linear subspace of C x . 

Proof. Let us first check, using the regularity of G, that the maximum of ||a/||/||/|| 
over all functions is attained when / is a constant function. Let every vertex in X have 
degree p\Y\, so that every vertex in Y has degree p\X\. Then, settting G{x,y) to be 1 if 
xy G E{G) and otherwise, 

ii«/ii 2 = £|£/(*) G (*>i/)| 2 

y x 
v,x' y 

x,x' y 
x x' y 

= Y J \f^)\ 2 P 2 \ X WY\=p 2 \X\\Y\\\f\\ 2 . 

X 

It follows that lla/ll/ll/ll never exceeds p|X| 1 / 2 |y I 1 / 2 . This bound is attained when / is 
the constant function 1: then ||/|| = (Xj 1 / 2 , and ||ct/|| = plXH^! 1 / 2 since af takes the 
value p\X\ everywhere on Y. 

The proof of Theorem 2.6 now tells us that the restriction of the linear map a to 
the space of functions that sum to zero can be decomposed as X]T=2 ^i w i ® v i- Without 
loss of generality, A2 ^ . . . ^ A n ^ 0. Choose k such that A2 = . . . = Afc > Afc+i and 
let X be the subspace of G c generated by ■ ■ • , t>fc- Then the restriction of a to X is 
^Ei=2 w i ® v i- This map is orthogonal on to its image, so ||a/|| = A2H/H for every 
/ e X. Since ol(^™ =2 fj,iV^j = Y^7=2 ^i^i w i^ it is clear that ||a/|| < A2H/H whenever 
£ xeG /(aO = Oand/£ X. □ 

§3. A group with no large product-free subset. 

In this section we give a quick proof that the density of the largest product-free subset 
of the group PSL 2 ((/) tends to zero as q tends to infinity. Recall that PSL 2 (q) is the 2- 
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dimensional projective special linear group over ¥ q , that is, the group of all 2 x 2 matrices 
over Fq with determinant 1, quotiented by the subgroup consisting of I and —I. It is 
natural to look at this family of groups, since it is one of the simplest infinite families of 
finite simple groups; simple groups themselves are natural to look at because if G' is a 
quotient of a group G, then any product-free subset of G' lifts to a product-free subset 
of G. As we have already mentioned, our proof will depend on one basic fact about 
representations of PSL 2 ((/), which we state without proof. 

Theorem 3.1. Every non-trivial representation of PSL 2 (q) has dimension at least (q — 
l)/2. □ 

The proof of Theorem 3.1, due to Frobenius, is not especially hard, though it isn't trivial 
either. A nice presentation of it can be found in [7]. To put this result in perspective, the 
order of PSL^g) is q(q 2 — l)/2, so the lowest dimension of a non-trivial representation is 
proportional to the cube root of the order of the group. This tells us that, in a certain 
sense, PSL2(<z) is very far from being Abelian. 

As mentioned in the introduction, we shall in fact prove a result that is more general 
in several ways. First of all, we shall prove it for any group V that has no low-dimensional 
non-trivial representation. Secondly, we shall prove an "off-diagonal" result: given any 
three large subsets A, B and C of T, there is a triple (a, 6, c) G Ax B x C such that ab = c. 
In order to prove this, it will be convenient (though not essential) to express the number 
of such triples in terms of the following bipartite Cayley graph G. The two vertex sets of 
G are copies of V and xy is an edge if and only if there exists a G A such that ax = y. 
(Note that if xy is an edge, it does not follow that yx is an edge - this is why we have 
to consider bipartite graphs.) Then the number of triples we are trying to count is the 
number of edges from the copy of B on one side of this bipartite graph to the copy of C 
on the other. If |T| = n and r = |A|/n, then we know from Theorem 2.2 that the number 
of edges between these copies of B and C will be approximately r|S||C| if G is sufficiently 
quasirandom. 

We shall make this argument precise later in the section. But first, let us prove that 
the graph G actually is quasirandom. 

Lemma 3.2. Let V be a Unite group and suppose that V has no non-trivial representation 
of dimension less than k. Let A be any subset of V and let G be the bipartite Cayley 
graph defined above. Let a be the corresponding linear map defined in the statement of 
Lemma 2.5. Let f : V — > C be any function such that XLer f( x ) = 0- Then ||ct/||/||/|| ^ 
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(^n/k) 1 ' 2 . 

Proof. Note first that, for any x and y in T, there exists a E A such that ax = y if and 
only if yx~ l E A. Thus, this is another way of stating which pairs xy are edges of G. 
Writing A for the characteristic function of the set A, we now have 

af(y) = J2 G(x, y)f(x) = £ A{yx~ 1 )f{x) = £ A(u)/(t/) = A * /(y) , 

x x uv=y 

where the last equality is true by the definition of the convolution of two functions defined 
on an arbitrary group. That is, af = A* f. 

Let A be the maximum of ||o/||/||/|| over all functions / that sum to zero, and let 
X be the set of all functions / that achieve this maximum. Then X is a linear subspace 
of C r , by Lemma 2.9 (of course, we count as belonging to X). Now if we choose any 
/ E X and any group element g E T, then the function T g f, defined by T g f(x) = f{xg), 
also belongs to X, since 

aT 9 f(u) = M*)T g f(y) = £ A (*)f(V9) = E A ( x )f(y) = a f( u 9) » 

xy=u xy=u xy=ug 

from which it follows that ||oT 3 /|| = ||a/||. Obviously, ||T fl /|| = ||/|| as well. 

Since any non-zero / in X is non-constant, there exists g E T such that T g f ^ /, 
from which it follows that the right-regular representation of V acts non-trivially on X. 
Therefore, the dimension of X is at least k, by hypothesis. 

It follows from Theorem 2.6 and Lemma 2.7 that k\ 2 is at most the number of edges 
in G, which is \A\n. That is, A ^ ({Aln/k) 1 / 2 , as stated. □ 

We have shown that G satisfies condition (ii) of Theorem 2.8, with C2 = {\A\/kn) 1 / 2 , as 
stated. This may make it look as though G becomes more quasirandom as the cardinality 
of A decreases, but that is just an accident arising from the way the condition is formulated. 
The point is that when A is smaller, the graph is less dense, which makes it hard for C2 to 
be small enough for condition (iv) of Theorem 2.2 to say anything non-trivial. 

Nevertheless, we have more or less proved the main result of this paper. All that 
remains is to put together the results we have stated or proved already. 

Theorem 3.3. Let V be a finite group with no non-trivial representation of dimension less 
than k, let n = \T\ and let A, B and C be three subsets ofT such that \A\\B\\C\ > n 3 /k. 
Then there exist a E A, b E B and c E C with ab = c. In particular, this is true if all of A, B 



11 



and C have size greater than n/k 1 / 3 . Furthermore, ifi]>0 and \A\\B\\C\ > n 3 /rfk, then 
the number of triples (a, 6, c) G A x B x C such that ab = c is at least (1 — 7/)|-A||5||C|/n. 

Proof. Let \A\ = rn, \B\ = sn and \C\ = tn. As in the previous lemma, let a be the 
linear map / i— > A * / . Let B stand for the characteristic function of the set B, and for 
eachx G T let f(x) = B(x)-s. Then J2 x f( x ) = °, and ll/f = (1- s) 2 \B\ + s 2 (n-\B\) = 
s(l — s)n ^ sn. 

It follows from Lemma 3.2 that ||o/|| 2 < rn 2 sn/k. But A * B(y) = A* (/ + s)(y) = 
af(y) + rsn, so whenever A * B(y) = we have \af(y)\ = rsn. It follows that the number 
m of y for which A * B(y) = satisfies the inequality m(rsn) 2 ^ rsn 3 /k, or m ^ n/rsk. 
But if rst > 1/k then this is less than tn, which implies that there exists c G C such that 
A* B(c) 7^ 0. Equivalently, there exist a G A and b E B such that a& = c, as claimed. 

As for the final claim, the number of triples in question is {A*B, C) = (af, C)+rsn\C\. 
But \(af, C)\ 2 ^ rn 2 sn\C\/k = \A\\B\\C\n/k, by the Cauchy-Schwarz inequality and the 
estimate for \\af\\ obtained earlier, while rsn\C\ = \A\\B\\C\/n. The result is therefore 
true provided 

\A\\B\\C\n/k ^ V 2 \A\ 2 \B\ 2 \C\ 2 /n 2 , 
and this inequality follows from our assumption. □ 

Recently, Kedlaya [13] proved a sort of converse to Theorem 3.3: under the additional 
hypothesis that G admits a transitive action on a reasonably large finite set, there exist 
sets A, B and C such that |A||S||C| ^ c\T\ 3 /k and such that there do not exist a G A, 
b G B and c G C with ab = c. 

Theorems 3.1 and 3.3 immediately give the following corollary, which is the result 
promised at the beginning of the section. 

Corollary 3.4. Let V be the group PSL 2 (q 1 ) and let n = \T\. Then T has no product-free 
subset of cardinality greater than 2n 8 / 9 . 

Proof. This follows from the Theorems 3.1 and 3.3, since n = q(q 2 — l)/2 and k can be 
taken to be (q — l)/2, which is greater than n 1 / 3 /8. □ 

§4. Quasirandom groups. 

The property we have just used for showing that a group V does not contain a large 
product-free set was that V has no non-trivial low-dimensional representations. From this 
we deduced that every large subset of V gives rise to a directed Cayley graph that is 
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quasirandom. Now we shall show that these two properties, as well as several others, are 
in fact equivalent. We shall use the word "quasirandom" for any group that has one, and 
hence all, of these properties, but there is a limit to how seriously this word should be 
taken. In particular, we do not have a model of random groups for which we can show 
that almost every group is quasirandom. (Gromov has, famously, defined a notion of 
random group, by taking a set of n generators and a certain number of random relations 
of prescribed length. However, his groups are infinite: to define a random finite group one 
would need enough relations to make it finite, but not enough to make it trivial, or very 
small. This could be a delicate matter.) 

A second difference between this notion of quasirandomness and the usual ones for 
graphs and subsets of groups is that we do not have a "local" characterization, where 
we count small configurations of a certain kind. (For graphs and subsets of groups these 
configurations are 4-cycles and quadruples ab~ x = cd -1 , respectively.) Indeed, it seems 
quite likely that no such characterization exists, and to see why, consider the case of the 
group S n . This is not quasirandom, since A n is a subgroup of index 2, but if you choose 
a small number of permutations 7ri,...,7Tfc at random (here k should be thought of as 
an absolute constant), then they will not have any small relations, so one will not have 
any "local" evidence that they are not all even permutations. That is, S n appears to be 
"locally indistinguishable" from A n , which is quasirandom. 

This may not be the end of the story, however, because there is a sense in which the 
non-quasirandomness of S n is at least "polynomially detectable." Suppose that you are 
given the multiplication table of S n , but you are given it abstractly and not told the order 
in which the permutations appear. Now suppose that you want an algorithm that will 
partition the elements into even and odd permutations in polynomial time (in nl). You 
can do it with a randomized algorithm as follows. Choose k elements at random from 
the group. Then the probability that they all happen to be even permutations is 2 _fc , 
and it is known that if they are all even then they almost surely generate A n , while if 
they are not all even then they almost surely generate S n . The time it takes to find the 
subgroup they generate is easily seen to be polynomial, so after a few attempts one will 
almost certainly generate A n (and we will know that we have done so, since A n is the 
only subgroup of S n of index 2). For a more general discussion of algorithms to find the 
irreducible representations of a group G, see [1]. 

Now let us begin the process of proving the main result of the section, the statement 
that various properties of groups are equivalent. Before we get to the statement itself, we 
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shall need some mostly standard lemmas. 

Lemma 4.1. Let S be the unit sphere in C n in the standard Euclidean norm, and let \i be 
the standard rotation-invariant probability measure on S. Then J J \ (v , w)\ 2 d/i(v)dfi(w) = 
n~ x . 

Proof. The integral in question is the mean square of the inner product of two random 
unit vectors. This average is clearly unaffected if we fix one of the vectors. But if (ej)™ =1 is 
an orthonormal basis of C n , then f s Y17=i \( v i e i)\ 2 dlJ,(v) = Is ^-dji{v) = 1, so by symmetry 
J s \ (v, ei)\ 2 d/j(v) = n~ l . This proves the lemma. □ 

Lemma 4.2. Let a be a linear map from C n to C n . Then tr(a) = n f s (av 7 v)dfi. 

Proof. Let (ej)" =1 be an orthonormal basis. Then the trace of the matrix of a with 
respect to this basis, and hence of a itself, is J27=i( ae ^ e *)- Since this is true for any 
orthonormal basis, we may average over all of them. The result follows immediately. □ 

Lemma 4.3. Let v± and V2 be two vectors in C n . Then (v±, V2) = n f s (vi,w)(w,V2)dfj,(w). 

Proof. The proof is basically the same as that of Lemma 4.2, since for any orthonormal 
basis (vi, V2) = Xir=i(' L ' 1 ' e i)( e ii v 2)i an d once again we can average over all of them. □ 

Lemma 4.4. Let vi,...,v n be unit vectors in C m . Then ^\ • | (v^, Vj}\ 2 ^ m~ l n 2 . 

Proof. The trick here is to notice that | (vi, Vj) \ 2 = (vi <g> u[, Vj <8> vj), where Vi <S> W is the 
m x m matrix with entries Vi(p)vi(q), and the inner product is the standard inner product 
on C m \ It follows that 

n 2 

^2\{vi,vj)\ 2 = \\J2 Vi ® Wi ■ 

i,j i=l 

Now tr(vi ® vi) = 1 for each i, so the trace of Y^=i Vi®vl is n, from which it follows that 
the right hand side is at least m _1 n 2 , which proves the lemma. □ 

Note that Lemma 4.4 is sharp. Basically any sufficiently symmetric example shows 
this, but one simple one is when m\n and the vectors Vi consist of n/m copies of some 
orthonormal basis. Lemma 4.1 proves that the result is sharp for a "continuous set" of 
vectors. Given a set for which the lemma is sharp, the proof above shows that $^"=1 v% ®W 
is n/m times the identity matrix. That is, the vectors Vi give us a representation of the 
identity, which is a well-known way of saying that they are nicely distributed round the 
unit sphere. 
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With these lemmas in place, we are ready for our main result of the section. 

Theorem 4.5. Let G be a finite group. Then the following are polynomially equivalent. 

(i) For every subset A C G, the directed Cayley graph with generators in A is c\- 
quasirandom. 

(ii) For every subset A C G and every function f : G — > C that sums to 0, \\A * /|| ^ 
c 2 n 1 / 2 \A\ 1 / 2 . 

(in) Every function f from G to the closed unit disc in C such that J2 g f(g) =0 is 
c^-quasirandom. 

(iv) For every function f from G to the closed unit disc in C such that J2 g /(flO = 0, 
the function F(x,y) = f{xy~ 1 ) is c^-quasirandom on G x G. 

(v) Every non-trivial representation of G has dimension at least cj 1 . 

Proof. The proof that (v) implies (i) and (ii) is essentially contained in the argument 
of the previous section. Indeed, suppose that the smallest dimension of a non-trivial 
representation is k, and let A C G. Let V be the directed Cayley graph of A and let X be 
the space of all functions / such that Yl f( x ) = an d \\A* /||/||/|| is maximized (together 
with the zero function). Let A be the maximum value of this ratio. Then X is invariant 
under the right-regular representation of G, so by hypothesis it has dimension at least k. 
Lemma 2.7 implies that k\ 2 ^ \A\n, so A ^ (n\A\/k) 1 ^ 2 . This means that if (v) holds then 
(ii) holds with c 2 = c l J 2 . 

From this and Lemma 2.7 it follows that the number of appropriately directed 4-cycles 
in G is at most \A\ A + n 2 \A\ 2 /k. In particular, whatever the cardinality of A, the graph is 
at least /c _1 -quasirandom. 

We proved that (iii) and (iv) were equivalent in Theorem 2.5. 

Now let us prove that (iii) implies (v). That is, given a non-trivial representation 
of dimension to, let us construct from it a function / that fails to be c-quasirandom for 
some c that depends polynomially on to. This we do by an averaging argument, which 
will exploit the lemmas we have just proved. To simplify the notation, we shall write the 
average of a function / defined on the sphere S as M v f(v) instead of J s f(v)dfi(v). 

A standard and easy lemma of representation theory tells us that if G has a represen- 
tation p then there is an inner product on the vector space V on which G acts such that 
the representation is unitary. Therefore, we may assume that p already has this property. 
Also, it will be convenient to assume, as we obviously can, that p is irreducible. To simplify 
the notation yet further, if v E V and g E G we shall write gv instead of p(g)(v). 
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Given any two vectors v and w in the unit sphere S of V, let f VjW : G — > C be defined 
by f v ,w(g) = (gv,w). Notice that (y)| ^ 1 for every g. Furthermore, for any g' we 
have 

gv = Yl g ' gv = g ' gv ) • 

9 9 9 

Since p is irreducible, it follows that Yl g gv = (or it would generate a 1-dimensional invari- 
ant subspace of V and p would not be irreducible). Therefore, J2 g fv,w(g) = J2 g (g v i w ) = 
0. Our averaging argument will show that at least one of these functions f V)W fails to have 
the property in (iii), if C4 < m -3 . 

By Lemma 4.3 (for the second equality), 



^w^ g fv,w(g)fv,w(gh) = E g E w (gv,w)(w,ghv) = m E g (gv, ghv) = m (v, hv) . 
Therefore, by Lemma 4.2, 



E v E w E g f VjW (g)f VjW (gh) = m tih 



Therefore, by the Cauchy-Schwarz inequality, 



E V E W 



^ g fv,w(g)fv,w(gh) >m \trh\ 



From this it follows that 

E v E w Eh 



^gfv,w(g)fv,w(gh) 



> m~ 4 E h \tih\ 2 , 



and hence that there exist v and w such that 



E, 



^gfv,w(g)fv,w(gh) 



^ m~ 4 E h \trh\ 2 . 



We now have the task of bounding E/,,|tr/?,| 2 from below. But E^|tr/i| 2 = 
E p E/j| tr((7/i _1 )| 2 = E g E h \(A g , A h )\ 2 , where A g and A h are the unitary matrices corre- 
sponding to g and h and the inner product comes from considering A g and Ah as vectors 

2 

in C m and taking the standard inner product there. Since these vectors have norm y/m, 

Lemma 4.4 implies that E g E h \(A g7 A h )\ 2 ^ m. Putting all this together, we find that 



^gfv,w(g)fv,w(gh) 



> m~ 



completing the proof that (iii) implies (v). 
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All that remains to prove the theorem is to show that (i) implies (iii). That is, given 
a non-quasirandom function defined on (7, we would like to construct from it a 01-valued 
function that gives rise to a Cayley graph that is also not quasirandom. Since this argument 
is standard, we shall be slightly sketchy about it. 

It can be shown that the formula 

2 \ !/ 4 



x,x' y 

defines a norm ||.|| on the space of functions F : G x G — > C. (This is a fairly easy lemma: 
a proof can be found in [9].) It follows from the triangle inequality that if F fails to be 
c-quasirandom, then either Re/ or Imf fails to be (c/16)-quasirandom. Therefore, if / is 
a function for which (ii) fails, then there must exist a function u with values in [—1, 1] and 
average such that 

J2(j2 u ^ u (9h)) 2 > c 3 \G\ 3 /16 . 
g h 

Now let v (g) = (1 + u(g))/2 for every g G G. Then a standard argument shows that 

J2(j2 v( - h ^ h )) 2 > |C| 3 /16 + c 3 |G| 3 /256 = (l + c 3 /16)|G| 3 /16 . 
g h 

(The argument is to expand the left-hand side into a sum of sixteen terms and observe 
that 

Y,(j2v(h)v(gh)) 2 - @! _ J_ J2(£,u(h)u(gh)) 2 

g,g' h g,g' h 

is a sum of squares.) 

Now choose a subset Ad G randomly, putting g into A with probability v (#), making 
all choices independently. Writing A also for the characteristic function of the set A, we 
wish to estimate the sum 

(E A (h)A(gh)) 2 = J2H A(h)A(gh)A(h')A(gh>) . 

g h g h,h' 

The number of choices of (g, h, h') for which the elements h, gh, h' and gh! are not all dis- 
tinct is (9(|G| 2 ), and for all other choices the expected value of A(h)A(gh)A(h')A(gh') 
is v(h)v(gh)v(h')v(gh'). Therefore, the expected value of the sum is at least (1 + 
C3/20)|G| 3 /16 when \G\ is sufficiently large. Also, with very high probability A has cardi- 
nality at most (l+C3/1000)|G|/2 (again, if \G\ is sufficiently large). It follows that there ex- 
ists a set A such that the directed Cayley graph defined by A is not c 3 /32-quasirandom. □ 
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In the light of this theorem we make the following formal definition of a quasirandom 
group. Recall that quasirandom functions were defined just after the proof of Theorem 
2.5. 

Definition. Let G be a finite group and let c > 0. Then G is c- quasirandom if every 
function / : G — > C that has average zero and takes values of modulus at most 1 is 
c-quasirandom. 

We end this section with two further characterizations of quasirandom groups. The 
first one states that the quasirandom groups are precisely those that do not contain a 
large product-free set. In one direction this is the main assertion of Theorem 3.3, so 
we shall concentrate on the other direction. As commented in the introduction, this final 
equivalence is not a polynomial one: we shall show that if the largest product-free subset of 
G has size 8\G\, then G has no non-trivial representation of dimension less than Clog(l/<5) 
for some absolute constant C. In the final section we shall discuss whether this result can 
be improved. 

Theorem 4.6. Let G be a group of order n and suppose that G has a non-trivial rep- 
resentation of dimension k. Then G has a product-free subset of size at least c h n, where 
c > is an absolute constant. 

Proof. Let <j> : G — > C k be a unitary representation of G. Without loss of generality <j) is 
irreducible, since otherwise we can find a representation with a smaller k. Also, without 
loss of generality it is faithful, since otherwise we can replace G by Gj ker 0. Therefore, 
without loss of generality the elements of G are themselves unitary transformations of C k . 

Now for any vector v G C k we have Y2 a eG av = 0> smce it is invariant under left 
multiplication by any f3 G G and the representation is irreducible. It follows from Lemma 
4.2 that the average trace of an element of G is 0. Since the trace of a unitary operator 
has modulus at most k, it follows that the number of elements a G G such that tret has 
real part greater than k/2 is at most 2n/3. That is, at least n/3 elements of G have trace 
with real part less than or equal to k/2. 

Now the trace is the sum of the eigenvalues, so if tret has real part at most k/2, there 
must be an eigenvalue uj with real part at most 1/2. 

Let X be the set of all a G G such that tra ^ k/2 and for each a G X let v(a) be a 
unit eigenvector with eigenvalue u(a) that has real part less than 1/2. 

Now let 5 > be an absolute constant to be chosen later. By a standard volume 
argument the unit sphere of C fc has a S-net of cardinality at most (3/5) 2k : so we can 
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choose at least (8/?>) 2k \X\ elements a of X such that all the vectors v(a) lie within 8 of 
some point and hence within 28 of each other. Therefore, we can choose at least (8/4) 2k n 
elements a of X such that all the v(a) are within 28 of each other and all the u(a) are 
within 8 of each other as well. Let Y be a subset of X with this property. 

We would now like to show that, for any a and a' in Y, the vectors av(a) and a'v(a) 
are close. This we deduce from the following equalities and inequalities, which all follow 
from the properties of Y and the fact that the elements of G preserve distance: av(a) = 
u>(a)v(a); \\u}(a)v(a) — u>(a')v(a)\\ ^ 8; \\u>(a')v(a) — u)(a')v(a')\\ ^ 28; u>(a')v(a') = 
a'v(a'); \\a'v(a')—a'v(a)\\ ^28. Therefore, by the triangle inequality, Hcn^a) — o/i^ct)!! ^ 
58. 

Now let a" be another element of Y. Then ||cn>(a) — ^ 58 as well. Also, 

from the previous inequality and the fact that a is unitary, we can deduce that \\a 2 v (a) — 
aa'v(a)\\ ^ 58. Therefore, if aa' = a" it follows that \\a 2 v (a) — av(a)\\ ^ 105, and hence 
that \\av (a) — v(a)\\ ^ 108, and finally that \uj{a) — 1| ^ 105. But we know that uj(a) is 
a complex number with modulus 1 and real part at most 1/2, from which it follows that 
\ui{a) — 1| ^ 1. Therefore, Y is product free as long as we choose 8 to be less than 1/10. 
Therefore, we can find a product-free subset Y of G of size at least c k n with c a positive 
absolute constant (in fact, 1/2000 will do), which proves the theorem. □ 

Our final characterization of quasirandom groups states that a group G is quasirandom 
if and only if every quotient of G is large and non-Abelian. We start with a natural 
special case of this, showing that all non-cyclic finite simple groups are quasirandom. 
One could presumably prove this result with a better bound than we obtain by using 
the classification of finite simple groups and simply looking up the dimensions of their 
irreducible representations. However, our proof is elementary. (Even this elementary 
argument may well be known, but we have had trouble finding it in the literature. Laszlo 
Pyber has pointed out to me that a slightly stronger bound can be deduced from a theorem 
of Jordan, as later modified by Frobenius and Blichfeldt, which has an elementary proof. 
See [10 Theorem 14.12]. However, the argument below is simpler.) 

Theorem 4.7. Let G be a non-cyclic finite simple group of order n. Then every non-trivial 
representation of G has dimension at least y / logn/2. 

Proof. Let <j) : G — > U(k) be an irreducible unitary representation of G. Since G is simple, 
4> has trivial kernel, so without loss of generality G itself is a finite subgroup of U(k). 

Let a be any element of G other than the identity. We claim first that a has a 
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conjugate that does not commute with a. To see this, suppose that all conjugates do 
commute with a. Then for any (3 and 7 in G we have 

(/fa/T^cry -1 ) = 7(7" 1 /3a/?- 1 7)«7" 1 = 7a(7" 1 /3a/3- 1 7 )7" 1 = faery -1 ) (/to/? -1 ) . 

That is, all conjugates of a commute with each other. But the subgroup of G generated 
by conjugates of a is easily seen to be normal, and therefore all of G, which implies that 
G is Abelian. But in that case the only irreducible representations of G are 1-dimensional, 
which implies that k = 1 and G is cyclic, contradicting our hypothesis. 

Suppose now that a is the closest element of G, in the operator norm on B(C k ), to 
the identity (apart of course from the identity itself), and let \\a — t\\ = e. Let (3 be a 
conjugate of a that does not commute with a. Then \\(3 — t\\ = e as well, since G consists 
of unitary transformations. Write a = 1 + 7 and (3 = t + rj. Then a(3 — (3a = 777 — 777. 
Therefore, since a -1 /? -1 is unitary, ||i — af3a~ 1 (3~ x \\ = H777 — /nil- Since a and (3 do not 
commute, and are closest elements to the identity, it follows that H777 — 777H ^ e. But we 
also know that H777 — 777 1| ^ 2|| 7 ||||r7|| = 2e 2 . Therefore, e ^ 1/2, which implies that no two 
elements of G are closer than 1/2 in the operator norm. 

It remains to determine an upper bound for the size of a 1 /2-separated subset of U(k). 
But U(k) is contained in the unit ball of B(C k ). The volume argument mentioned in the 
previous lemma shows that for any (i-dimensional real normed space and any e > the 
largest e-separated subset of the unit ball has size at most (1 + 2/e) d . The normed space 
B(C k ) is a /c 2 -dimensional complex space, so, setting d = 2k 2 and e = 1/2, we deduce 
that a 1 /2-separated subset of U(k) has cardinality at most 25 fc2 . That is, n ^ 25 fc2 , from 
which the theorem follows. □ 

Note that the alternating groups A n have representations of dimension n — 1 (since 
they act on the subspace of C n consisting of vectors whose coordinates add up to 0). 
Therefore, the bound in Theorem 4.7 cannot be improved to more than log n/ log log n. 

Theorem 4.8. Let G be a group of order n and suppose that for every proper normal 
subgroup H of G, the quotient G/H is non- Abelian and has order at least m. Then G 
has no non-trivial representation of dimension less than y/Togm/2. Conversely, if G has 
an Abelian quotient, then G has a 1-dimensional representation, and if G has a quotient 
of order m, then G has a representation of dimension y/m. 

Proof. Let us quickly deal with the converse, since this is easy and not the main point of 
interest. Any representation of a quotient of G can be composed with the quotient map so 
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that it becomes a representation of G of the same dimension. Therefore, the result follows 
from two standard facts of representation theory: that the irreducible representations of 
Abelian groups are 1-dimensional (and exist!), and that every group of order m has a 
representation of dimension at most y/m. (This second fact follows from the result that 
the sum of the squares of the dimensions of the irreducible representations is m.) 

Now let us turn to the more interesting direction of the theorem. Let if be a maxi- 
mal proper normal subgroup of G. Then the quotient group G/H is simple and, by our 
hypothesis, non-Abelian. Let : G — > U(k) be a unitary representation of G. If we knew 
that the kernel of <j> was H, then we would have a representation of G/H to which we 
could apply Theorem 4.7. However, this does not have to be the case, so instead we must 
imitate the proof of Theorem 4.7, as follows. 

We may clearly assume that <p is a faithful representation (or else we look at the 
quotient of G by its kernel). Therefore, we shall think of the elements of G itself as unitary 
maps on C k . Let us now define a metric on G/H by taking d(aH, (3H) to be the smallest 
distance (in the operator norm again) between any element of otH and any element of 
PH. Let a be an element of G \ H such that the distance from aH to H, with respect to 
this metric, is minimized, and note that this distance is just the smallest distance in the 
operator norm from any element of aH to the identity. Without loss of generality, a itself 
is an element of aH for which this minimum is attained. 

Now G/H is simple and non-Abelian. Hence, by the argument of the last section, we 
can find a conjugate f3H of aH in G/H that does not commute with aH. It is easy to 
see that we can choose the representative (3 to be a conjugate of a in G, so let us do this. 
Then (3 is a conjugate of a such that not only do a and (3 not commute, but they do not 
even belong to the same coset of H. Moreover, the distance from (3 to the identity is the 
same as the distance from a to the identity. As in the proof of Theorem 4.8, let e be this 
distance, and let a = i + 7 and (3 = 1 + 77. 

Once again, the distance between a(3 and (3a is H777 — 777H, and therefore so is the 

distance between 1 and af3a~ l f3~ l . Since a/3a~ l f3~ l does not belong to H, it follows from 

our minimality assumption that H777 — 777 1| ^ e, as before, and it is also at most 2e 2 for 

precisely the same reason as before. Therefore, no two elements of different cosets of H 

can be within 1/2 of each other in the operator norm, so, by the upper bound given in the 

proof of Theorem 4.7 for the size of a 1/2-separated subset of U(k), there can be at most 
,2 

25* cosets of H. This proves the theorem. □ 
A good example to bear in mind in connection with Theorem 4.8 and its proof is the 
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following family of groups. Let p and k be positive integers and let G(p, k) be the subgroup 
of U(k) generated by all diagonal matrices withpth roots of unity as their diagonal entries, 
and all permutation matrices corresponding to even permutations. Thus, a typical element 
of G(p, k) is a permutation matrix of determinant 1 with its Is replaced by arbitrary pth 
roots of unity. The subgroup H (p, k) generated by just the diagonal matrices in G{p, k) 
is normal, and the quotient is isomorphic to the alternating group A^. Moreover, one can 
show that any proper normal subgroup of G(p, k) is contained in H(p, k). Therefore, these 
groups are quasirandom as k tends to infinity, despite being of arbitrarily high order for 
any fixed k. The reason this can happen is that, as the proof of Theorem 4.8 shows is 
necessary, the cosets of H (p, k) are well-separated. 

In practice, Theorems 4.6 and 4.8 are not particularly useful characterizations of 
quasirandomness because the equivalences are not polynomial equivalences. In other words, 
they are fine if all one wants is qualitative statements (such as that no subset of positive 
density is product free) but too crude if one is interested in bounds of the kind obtained 
in this paper. However, sometimes a qualitative statement is interesting - for example, if 
one is wondering whether a particular family of groups is quasirandom and wants to make 
a preliminary check. For instance, Theorem 4.8 tells us that SL2(p) is quasirandom, since 
{t, —l} is a maximal normal subgroup of very high index. However, this particular group 
is much more quasirandom than Theorem 4.8 guarantees. As for Theorem 4.6, it can in 
fact be improved to a polynomial equivalence: this will be discussed in the final section. 

§5. Solving equations in quasirandom groups. 

The purpose of this section is to prove a generalization of Theorem 3.3: instead of 
finding a and b such that a, b and ab each lie in specified sets, we shall find a±, . . . , a m such 
that for every non-empty subset F C {1,2,..., m} the product of those with i e F lies 
in a specified set. In other words, perhaps surprisingly, we can choose m elements of the 
group in such a way that exponentially many conditions are satisfied simultaneously, using 
only the fact that a reasonable number of elements satisfy each condition individually. 

Underlying the argument is the following basic lemma, which is a reformulation of 
the last part of Theorem 3.3 that will be slightly more convenient. The proof of the main 
theorem of this section will use it to drive an inductive argument. 

Lemma 5.1. Let G be a group of order n such that no non-trivial representation has 
dimension less than k. Let A and B be two subsets of G with densities rn and sn, 
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respectively and let 5 and t be two positive constants. Then, provided that rst ^ (5 2 k) 1 , 
the number of group elements x G G for which \A fl xB\ ^ (1 — 5)rsn is at most tn. 

Proof. Let C be the set {x~ x : x G B}. Then 

\AnxB\ = J2 Mv) {xB) (y) = J2 A(y)B(x- 1 y) = £ A^C^x) = A * C(x) . 
y y y 

By Theorem 4.5, if / : G — > R sums to zero, then \\A * /|| < (r/k) 1/2 n\\f\\. Applying this 
result in the case f(x) = C(x) — s and noting that ||/|| 2 = s(l — s)n ^ sn, we deduce that 
\\A * C — rsn\\ 2 ^ rsn 3 /k. It follows that the number of x such that A * C(x) ^ (1 — S)rsn 
is at most n/5 2 rsk. If rst ^ (5 2 k)~ 1 , then this is at most tn, as required. □ 

Note the following easy consequence of Lemma 5.1, which shows that it is indeed 
effectively the same as Theorem 3.3. Suppose that rst > 1/k and that C is a subset of G 
with density t. Lemma 5.1 with 5 = 1 tells us that the number of y such that AHy~ l B = 
is less than tn, from which it follows that there exists y G C such that A fl y~ x B ^ 0. But 
then, if x G A fl y~ l B, we have x G A, y G C and ya; G -B. 

In order to make the proof of our general theorem more transparent, we begin with 
the special case m = 3. 

Theorem 5.2. Let G be a group of order n such that no non-trivial representation has 
dimension less than k. Let A\, A 2 , A 3 , Ai 2 , Ais, A23 and A123 be subsets of G of densities 
Pi, P2, P3, P12, Pis, P23 and P123, respectively. Then, provided that P1P2P12, P1P3P13, 
P1P23P123 and P2P3P23P12P13P123 are all at least 16/ k, there exist elements x\ G A\, x% G A 2 
and X3 G A3 such that X\X 2 G A\ 2 , X\X3 G A\3, X2X3 G A23 and X1X2X3 G A123. 

Proof. We start by choosing x\, noting that there are certain conditions it will have 
to satisfy if there is to be any hope of continuing the proof. For example, later we shall 
need to choose X2 G A2 such that X1X2 G A12. Equivalently, we shall need X2 to belong 
to A 2 fl x^ 1 Ai2. Similarly, we shall need X3 G A3 fl x 2 ~ 1 Ai3 and X2X3 G A23 fl x± 1 Ai 2 3. 
Therefore, we want these sets to be not just non-empty, but reasonably large. 

By Lemma 5.1, the number of x\ such that |A 2 nx^ 1 Ai2| < P2Pi2n/2 is at most p±n/A, 
provided that P1P2P12 ^ 16/fc. Similarly, if P1P3P13 ^ 16/ k and P1P23P123 ^ 16/ k, then 
the number of x\ such that \A^ fl x^~ 1 Ai3| < p3pi3n/2 is at most pin/4 and the number 
of xi such that \A 2 3 C\x± 1 Ai23\ < ^23^123^-/2 is at most p±n/4. Therefore, provided these 
inequalities hold, we can choose x\ G Ai such that, setting B 2 = A 2 fl A 12 , B 3 = A3 fl A 13 
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and B 23 = A 23 f]A 123 , <?2 = P2P12A q 3 = p 3 Pi 3 /2 and q 23 = P23P123/2, we have \B 2 \ ^ q 2 n, 
\B 3 \ ^ q 3 n and |S 2 3| ^ <?23^- 

At this point we could quote our results about product-free sets, but instead let us 
repeat the argument (which is more or less an equivalent thing to do). We would like to 
choose x 2 G B 2 such that B 3 (~)x 2 ~ 1 B 23 is non-empty. Lemma 5.1 implies that the number of 
x 2 such that B 3 f)x 2 1 B 3 is empty is at most q 2 n/2, provided that q 2 q 3 q 23 > 2/k. Therefore, 
provided we have this inequality, which, when expanded, says that P2P3P23P12P13P123 ^ 
16//c, there exist X2 G B2 and x 3 G B 3 such that X2X 3 G -623- But then x\, X2 and x 3 
satisfy the conclusion of the theorem. □ 

It is clear that the above argument can be generalized. The only thing that is not 
quite obvious is the density conditions that emerge from the resulting inductive argument. 
Here is what they are. Suppose that for every subset F C {1,2,..., to} we have a subset 

of a group G with density pp and suppose that no non-trivial representation of G 
has dimension less than k. Now let h be an integer less than m and let E be a subset of 
{h + 1, . . . , to}. Let Ah,E be the collection of all sets of the form UUV, where max U < h 
and V is either {h}, E or {h} U E. We shall say that the sets Ap satisfy the (h, E)-density 
condition if YIfeAh e P p * s a ^ ^ eas ^ 2 3m //c. We shall say that they satisfy the density 
condition if they satisfy the F-density condition for every h < m and every non-empty set 
E C {h + l,...,m}. 

To get an idea of what this means, notice that the inequalities we assumed in Theo- 
rem 5.2 are the (1, {2})-condition, the (1, {3})-condition, the (1, {2, 3})-condition and the 
(2, {3})-condition, respectively, except that there we had a slightly better dependence on 

TO. 

Theorem 5.3. Let G be a group of order n such that no non-trivial representation has 
dimension less than k. For each non-empty subset Fc {1,2,... , to} let A F be a subset 
of G of density pf, and suppose that this collection of sets satisfies the density condition. 
Then there exist elements of G such that xf G Ap for every F, where x f stands 

for the product of all xi such that i G F, written with the indices in increasing order. 

Proof. By the density condition, for every non-empty subset F C {2, . . . , to} we have the 
inequality 2~ m pip F piF ^ 2 2m /k. (Here we use the shorthand IF to stand for {1} U F.) 
Therefore, by Lemma 5.1, for each F the number of x\ such that \Ap H x~ l A\p\ ^ 
PfPif(^— 2~ m ) is at mostpin/2 m . Therefore, the number of x\ such that {ApCix^ 1 Aif\ ^ 
PfPif(1 — 2~ m ) for at least one non-empty F C {2, . . . ,to} is at most pin/2. It follows 
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that there exists x\ G Ai such that, if for every non-empty F C {2, . . . , m} we set Bp = 
Ap fl A\f, then every Bp has density at least q F = PfPif(1 — 2~ m ). 

We claim now that the sets Bp satisfy the density condition (after a relabelling of the 
index set). Let ft < m and let E be a non-empty subset of {ft + 1, ... , m}. Define Bh,E to 
be the set of all F of the form U U V with U C {2, . . . , ft - 1} and F equal to {ft}, £ or 
{ft} U £. Then 

n qF >(i-2-r n wiF=(i-n r n 

-FeSh.E FEBh,E FeA h , E 

But (1 - 2" m ) 2m > 1/4 and IIfe^ b > 2 3m /k, so this implies that ]\ FeBh E q F > 
2 3 ( m ~ 1 >/k. Therefore, the sets Bp satisfy the density condition. 

This proves the inductive step of the theorem. To be on the safe side, we take as our 
base case the case m = 2. (We do this so that we do not have to worry about the definition 
of the density condition when E cannot be non-empty.) This follows easily from the remark 
following Lemma 5.1 if one sets A\ = C, A 2 = B and A 12 = A. The density condition in 
this case is stronger than the hypothesis we needed to guarantee the existence of x\ and 
x 2 such that X\ G A 1: x 2 G A 2 and X\ 2 G A 12 . Therefore, the theorem is proved. □ 

We now give a couple of corollaries of Theorem 5.3. They are special cases of the 
theorem: the only extra content is that we need to do a small amount of calculation to 
optimize certain densities while preserving the density condition. 

Corollary 5.4. Let G be a group of order n such that no non-trivial representation has 
dimension less than k. For each non-empty subset Fc {1,2,... , m} let Ap be a subset 
of G of density p. Then, provided that p 3 - 2 ™ 2 > 2 3m /k (which is true if p > 2k~ 1 ^ 2m ), 
there exist xi, . . . , x m such that xp G Ap for every F . 

Proof. Since all the densities are the same, all we have to do is look at which set Ah,e is 
largest. Obviously they get larger as ft gets larger, so the largest one is when ft = m — 1. 
This has size 3.2 m_2 since there are 2 m_2 possibilities for U and 3 possibilities for V. The 
result now follows from Theorem 5.3. □ 

Corollary 5.5. Let G be a group of order n such that no non-trivial representation has 
dimension less than k. For every pair 1 ^ i < j ^ m let Aij be a set of density p. Then, 
provided that p > Ak~ l ^ 2m ~ 3 \ there exist x±, . . . ,x m such that XiXj G A^ for every i < j. 

Proof. We shall apply Theorem 5.3 again, setting Ap to be G whenever F has cardinality 
other than 2. Then p F = p if F has cardinality 2, and p F = 1 otherwise. Now let us work 
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out how many sets of size 2 are contained in Ah,E- If E has cardinality greater than 1 
then there are h — 1 such sets, since then V must equal {h} and U must be a singleton. If 
E has cardinality equal to 1 then there are 2h — 1 sets, since either U is a singleton and 
V is {/i} or E, or t/ is empty and V is {/i} U E. Since the largest possible value of h is 
m — 1, this tells us that the sequence exists provided that p 2m ~ 3 > 2 3m //c, which implies 
the corollary. □ 

It is possible to generalize Theorem 5.3 slightly further by exploiting two facts about 
Lemma 5.1. Instead of giving full details, we shall merely state two results and briefly 
explain how they are proved. 

Theorem 5.6. Let G be a group of order n such that no non-trivial representation has 
dimension less than k. For every pair 1 ^ i < j ^ m let Aij be a set of density p. Then, 
provided that p > 4/c _1 /( 2m_3 ), there exist Xi,...,x m such that XiXj 1 G A^ for every 
i < j. 

Theorem 5.7. Let G be a group of order n such that no non-trivial representation has 
dimension less than k. Let A\, A 2 , A 3 , Ai 2 , Ais, A 2 3 and A123 be subsets of G of densities 
Pi, P2, P3, P12, P13, P23 and P123, respectively Then, provided that P1P2P12, P1P3P13, 
P1P23P123 smd p 2 p3p 2 3P\ 2 p\3P\ 2 3 are all at least 16/k, there exist elements x\ G A\, x 2 G A 2 
and X3 G A3 such that X\X 2 G A\ 2 , G A\3, x 2 x^ x G A 2 3 and x 2 x% G Ai 23 . 

To prove statements like this, one exploits Lemma 5.1 and its method of proof to the 
full. Not only can one show that A n xB is nearly always about the same size (when A 
and B are large enough), but also A n x~ x B, A n Bx and A n Bx~ x . The inductive proof 
of Theorem 5.3 works as long as at each stage of the inductive process the variable one 
is trying to choose, or its inverse, appears either at the beginning or at the end of each 
product. So, for example, in Theorem 5.7 one starts by choosing x\ such that A 2 f)x± 1 Ai 2 , 
A3 fl A13X1 1 and A 23 n are all large. One is then left needing to place x 2 , X3 and 

x 2 x^ x into these sets, which can clearly be done. 

Remarks. Although it may at first seem surprising that one can cause so many equations 
to be satisfied simultaneously, there is an intuitive explanation for this, at least for readers 
familiar with the notion of higher-degree uniformity for subsets of Abelian groups. (See [8, 
Section 3] for a definition of this.) In that terminology, Lemma 5.1 shows that all dense 
subsets of G have a property very similar to uniformity. But if that is the case, then 
almost all intersections of a dense set A with a translate of itself will still be dense, and 
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will therefore be uniform as well, which shows that A has a sort of non-Abelian version 
of quadratic uniformity. But if uniformity implies quadratic uniformity, then it implies 
uniformity of all degrees. In the Abelian case, the higher the degree of uniformity a set 
has, the more linear equations one can hope to solve simultaneously in that set, so it is 
not too surprising after all that one can solve large numbers of equations simultaneously 
in subsets of a group where every dense set is uniform. 

Another interesting aspect of Theorem 5.3 is that under certain circumstances it can 
yield very good bounds. For simplicity let us consider the case where all the sets Ap have 
density either p or 1, and let T be the set of F such that the density is p. Suppose that no 
element of {1, 2, . . . , m} is contained in more than r of the sets F G T . Then no set Ah,E 
can contain more than 2r elements of JF, so we can satisfy all the conditions simultaneously 
if p 2r ^ 2 3m /k. That is, for fixed r we can contain a power that is independent of m. (With 
a bit of care, the exponential dependence of the constant on m can be improved as well.) 
This situation would arise if, for example, we wanted XiXj to belong to Aij whenever ij 
was an edge of a certain graph H of maximal degree 10. 

§6. Open questions. 

The results of this paper leave several questions unanswered. One that has been 
mentioned already is the following (which is not formulated in a precise manner). 

Question 6.1. Is there a good model for large random finite groups with the property 
that a group chosen according to this model has a high probability of being quasirandom? 

Another question that has been touched on is whether Theorem 4.6 can be improved. 
More precisely, in an earlier draft of this paper the following was asked. 

Question 6.2. If G has a non-trivial representation of dimension k, does G have a 
product-free subset of size cn for some c that depends polynomially on k~ x ? 

I am grateful to Lazslo Pyber for informing me that the answer is yes, for the following 
reason. It can be shown using the classification of finite simple groups that a finite group 
with a /c-dimensional representation must have a proper subgroup of index at most k c (for 
some absolute constant c) or an Abelian quotient. But in both cases it is easy to construct 
product-free subsets. A stronger result that also implies a positive answer to Question 6.2 
can be found in a recent paper of Nikolov and Pyber [15]. This leaves open the question of 
whether the classification of finite simple groups is needed for solving Question 6.2. The 
results used in the solution just mentioned do seem to have that flavour, but it does not 
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seem completely unreasonable to hope for a classification-free answer to the question. We 
put this as our next question. 

Question 6.3. Is there an elementary proof that if G has a non-trivial representation of 
dimension k then G has a product-free subset of size cn for some c that depends polyno- 
mially on k~ x ? 

A closely related question is to find good bounds for the largest Haar measure of a 
product-free subset of SU(n). The methods of this paper, suitably adapted, ought to prove 
that this is at most Cn -1 / 3 , but the largest product-free subsets of SU(n) that we know of 
are in the spirit of the construction of Theorem 4.6 and are therefore exponentially small. 
We therefore ask the following question, with a tentative expectation that the answer is 
yes. 

Question 6.4. Does there exist a constant c < 1 such that every subset A C SU (n) that 
is measurable and product-free has measure at most c n ? 

It is easy to prove that no stronger bound can hold: just fix a unit vector x G C n 
and let A be the set of unitary maps a such that (xo,axo) < —1/2. If a, (3 and a/3 all 
belong to A, then (xo,axo), (xo,a(3xo) (axo, ol($xq) are all less than —1/2. But it is an 
easy exercise to show that it is impossible to find three unit vectors with this property. 
(Just look at the square of the norm of their sum.) It is also easy to see that A has size at 
least c n for some positive constant c. 

Several problems arise when one starts to think about the following broad question: 
which equations have solutions in large subsets of PSL2 (q) , or of other quasirandom groups? 
The most general answer we have been able to find is Theorem 5.3 (and the slight gener- 
alization mentioned at the end of the last section), but it is not obvious that that is the 
end of the story. Here are two questions that give some idea of what further results might 
or might not be true. The first has an easy negative answer: if A, B and C are three large 
sets, can one find a G A, b G B and c G C such that ab = cal The answer is no, since if 
ab = ca, then b = a~ 1 ca. Thus, b and c are conjugate, so to find a counterexample all one 
has to do is make B and C disjoint unions of conjugacy classes. 

However, for a very similar question it is much less clear what the answer is. If A is a 
quasirandom subset of an Abelian group, then A contains approximately the same number 
of arithmetic progressions of length 3 (defined to be sequences of the form (a,a + d,a + 2d) 
with d 7^ 0) as a random set of the same cardinality, and it also contains about the same 
number of solutions to the equation x + y = z. Moreover, the proofs of these two facts 
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are very similar. What happens if we investigate arithmetic progressions in subsets of 
PSL 2 (g)? 

The most obvious question is not very interesting: does every dense subset A of 
PSL 2 ((/) contain a progression of length 3, where this is now defined to be a sequence of 
the form (x, gx, g 2 x)l (It might be better to call this a "left progression," since it is not 
the same as a sequence of the form (x 7 xg,xg 2 ).) The answer is yes, since PSL 2 (g) can 
be decomposed into right cosets of a cyclic subgroup of order q: we can therefore find a 
coset such that A intersects it densely and apply Roth's theorem. However, this leaves two 
questions unanswered. The first is whether A must in fact contain roughly the "expected" 
number of progressions of length 3. 

Question 6.5. Let A be a subset of PSL 2 ((/) of density 5 and let g and x be randomly 
chosen elements of PSL 2 (g). Is the probability that x, gx and g 2 x are all in A necessarily 
approximately equal to S 3 ? 

The second question is closely related. 

Question 6.6. Let A, B and C be three dense subsets of PSL 2 (q). Must there be an 
arithmetic progression (a,b,c) G A x B x C? 

This would be interesting, since an "off-diagonal" Roth theorem of this kind is com- 
pletely false in an Abelian group. Of course, the last two questions can be asked for other 
quasirandom groups. Notice also that if (a, 6, c) = (x,gx,g 2 x), then c = ba~ 1 b, and if 
c = ba~ 1 b then (a, 6, c) = (a,ga,g 2 a) for g = ba~ x . Therefore, an equivalent question to 
the last one is the following: if A, B and C are three dense subsets of PSL 2 (<?), must there 
exist a E A, b E B and c G C such that bob = c? (To make the question cleaner we have 
replaced A by the set of inverses of elements of A, which obviously makes no difference.) 

There is a natural bipartite graph that one can define in response to these prob- 
lems: join x to y if there exists b G B such that bxb = y. If this graph is automatically 
quasirandom, then the answers to both problems are yes. But it is not clear whether it is 
quasirandom. The difficulty is that we are mixing left and right actions, which makes rep- 
resentation theory less easy to apply. (Notice that the natural bipartite graph associated 
with the equation ab = ca we considered first joins x to all points of the form a~ 1 xa. It 
is easy to see that this graph is very far from quasirandom - indeed, it has multiple edges 
and a typical edge has very high multiplicity.) 

Acknowledgements. I am grateful to Vera Sos for drawing my attention to this problem, 
and to Laszlo Babai, Alexander Gamburd, Kiran Kedlaya, Laszlo Pyber, Vlado Nikiforov 
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